1 Introduction

write introduction here



2 Observed vs True data

In this section we will compare the observed with the true dataset.

Table 2.1: Observed Data
age smoke sex intensity active rest height weight bmi
42 no female high NA 75 NA NA 22.4
31 NA male low NA 62 NA NA 23.8
36 no male low 109 76 182 78.0 23.5
31 no female low 78 62 164 53.9 20.0
42 no male low NA 66 189 NA 23.4
Table 2.1: True Data
age smoke sex intensity active rest height weight bmi
42 no female high 94 75 161 58.1 22.4
31 no male low 86 62 184 80.6 23.8
36 no male low 109 76 182 78.0 23.5
31 no female low 78 62 164 53.9 20.0
42 no male low 103 66 189 83.6 23.4

2.1 Descriptives

Obviously, neither the mean nor the variance of age, and rest changed since these has no missing values.

The mean of active is is also almost entirely unaffected. The variance of active changed a bit in the observed data, but this difference is simply due to sampling variability (we’ve deleted about 40% of the observations). The missing values in active are MCAR, so we would not expect any substantial changes in the marginal distribution of active.

The mean of height is is also almost entirely unaffected. The variance of active changed a bit in the observed data, but this difference is simply due to sampling variability (we’ve deleted about 30% of the observations). The missing values in active are MCAR, so we would not expect any substantial changes in the marginal distribution of height.

The mean of weight is is also almost entirely unaffected. The variance of active changed a bit in the observed data, but this difference is simply due to sampling variability (we’ve deleted about 57% of the observations). The missing values in active are MCAR, so we would not expect any substantial changes in the marginal distribution of weight.

The mean of bmi is is also almost entirely unaffected. The variance of active changed a bit in the observed data, but this difference is simply due to sampling variability (we’ve deleted about 30% of the observations). The missing values in active are MCAR, so we would not expect any substantial changes in the marginal distribution of bmi.

Table 2.2: Means and variances in true and observed dataset
Variables \(M_{obs}\) \(M_{true}\) var obs var true
Age 38.52 38.52 149.73 149.73
Active 92.58 93.13 383.05 383.04
Rest 69.83 69.83 120.78 120.78
Height 174.50 173.99 100.66 105.29
Weight 73.91 73.58 260.26 274.85
Bmi 24.11 24.06 12.91 13.38
Note.
obs = Observed Dataset, true = True Dataset

Over here categorical data descriptions

2.2 Correlations

Table 2.3: Correlations of obsereved data
age smoke sex intensity active rest height weight bmi
age 1.00 0.01 -0.17 0.21 -0.49 -0.39 0.19 0.25 0.18
smoke 0.01 1.00 -0.09 -0.29 0.15 0.23 0.18 0.18 0.18
sex -0.17 -0.09 1.00 -0.09 0.11 0.06 -0.73 -0.68 -0.42
intensity 0.21 -0.29 -0.09 1.00 -0.37 -0.55 0.13 0.12 0.02
active -0.49 0.15 0.11 -0.37 1.00 0.56 0.00 0.01 0.05
rest -0.39 0.23 0.06 -0.55 0.56 1.00 -0.20 -0.12 0.06
height 0.19 0.18 -0.73 0.13 0.00 -0.20 1.00 0.78 0.34
weight 0.25 0.18 -0.68 0.12 0.01 -0.12 0.78 1.00 0.88
bmi 0.18 0.18 -0.42 0.02 0.05 0.06 0.34 0.88 1.00
Table 2.4: Correlations of true data
age smoke sex intensity active rest height weight bmi
age 1.00 -0.05 -0.17 0.21 -0.54 -0.39 0.20 0.23 0.20
smoke -0.05 1.00 -0.11 -0.31 0.18 0.27 0.17 0.25 0.24
sex -0.17 -0.11 1.00 -0.09 0.09 0.06 -0.72 -0.69 -0.47
intensity 0.21 -0.31 -0.09 1.00 -0.37 -0.55 0.12 0.06 0.01
active -0.54 0.18 0.09 -0.37 1.00 0.61 -0.10 0.02 0.09
rest -0.39 0.27 0.06 -0.55 0.61 1.00 -0.15 -0.04 0.05
height 0.20 0.17 -0.72 0.12 -0.10 -0.15 1.00 0.77 0.36
weight 0.23 0.25 -0.69 0.06 0.02 -0.04 0.77 1.00 0.87
bmi 0.20 0.24 -0.47 0.01 0.09 0.05 0.36 0.87 1.00

2.3 Regression

Table 2.5: Regression analysis of True and Observed Data
\(\beta_{obs}\) \(SE_{obs}\) \(p_{obs}\) \(\beta_{true}\) \(SE_{true}\) \(p_{true}\)
(Intercept) 78.444 14.34 0.000 80.384 9.03 0.000
age -0.809 0.11 0.000 -0.883 0.07 0.000
bmi 1.681 0.55 0.003 1.776 0.35 0.000
sexfemale 32.756 20.78 0.117 43.460 14.16 0.002
smokeyes 1.615 2.91 0.580 3.516 1.99 0.078
bmi:sexfemale -1.131 0.88 0.199 -1.674 0.60 0.006

3 Missingness

There are 540 missing values. 0 for age, 0 for sex, 0 for intensity, 0 for rest, 58 for smoke, 92 for height, 93 for bmi, 123 for active, and 174 for weight. moreover there are 132 completely observed rows, 15 rows with one missing value, 37 rows with two missing values, 52 rows with three missing values, 55 rows with four missing values, 15 rows with five missing values.

pattern of the missingnesspattern of the missingnesspattern of the missingnesspattern of the missingness

Figure 3.1: pattern of the missingness




3.1 Looking for the missingness

weight: \(t =\) 0.381, \(p =\) 0.704

height: \(t =\) 1.271, \(p =\) 0.205

bmi: \(t =\) 0.336, \(p =\) 0.737

active: \(t =\) -0.606, \(p =\) 0.545

smoke: \(x^2 =\) 1.154, \(p =\) 0.283

comparing the distribution of the observed and true dataset

Figure 3.2: comparing the distribution of the observed and true dataset

3.2 Missingness of weight

missing weight on sex: \(x^2 =\) 0, \(p =\) 1

missing weight on smoke: \(x^2 =\) 0.036, \(p =\) 0.848

missing weight on intensity: \(x^2 =\) 2.589, \(p =\) 0.274

missing weight on rest: \(t =\) -0.482, \(p =\) 0.63

missing weight on age: \(t =\) -0.59, \(p =\) 0.556

missing weight on height: \(t =\) -0.639, \(p =\) 0.525

missing weight on bmi: \(t =\) -0.012, \(p =\) 0.99

missing weight on active: \(t =\) -1.44, \(p =\) 0.156

Looking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MARLooking whether the missingness of weight is MAR

Figure 3.3: Looking whether the missingness of weight is MAR

3.3 Missingness of height

missing height on sex: \(x^2 =\) 0, \(p =\) 1

missing height on smoke: \(x^2 =\) 0.111, \(p =\) 0.739

missing height on intensity: \(x^2 =\) 3.563, \(p =\) 0.168

missing height on rest: \(t =\) 0.242, \(p =\) 0.809

missing height on age: \(t =\) 0.32, \(p =\) 0.749

missing height on bmi: \(t =\) -0.012, \(p =\) 0.99

missing height on active: \(t =\) -1.535, \(p =\) 0.137

Looking whether the missingness of height is MARLooking whether the missingness of height is MARLooking whether the missingness of height is MARLooking whether the missingness of height is MARLooking whether the missingness of height is MARLooking whether the missingness of height is MARLooking whether the missingness of height is MAR

Figure 3.4: Looking whether the missingness of height is MAR

3.4 Missingness of Active

missing active on sex: \(x^2 =\) 1.957, \(p =\) 0.162

missing active on smoke: \(x^2 =\) 0.293, \(p =\) 0.589

missing active on intensity: \(x^2 =\) 2.193, \(p =\) 0.334

missing active on rest: \(t =\) -1.558, \(p =\) 0.12

missing active on age: \(t =\) -0.963, \(p =\) 0.337

missing active on height: \(t =\) -0.232, \(p =\) 0.817

missing active on bmi: \(t =\) -1.883, \(p =\) 0.062

missing active on weight: \(t =\) -1.948, \(p =\) 0.059

Looking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MARLooking whether the missingness of active is MAR

Figure 3.5: Looking whether the missingness of active is MAR

3.5 Missingness of Bmi

missing bmi on sex: \(x^2 =\) 0.019, \(p =\) 0.889

missing bmi on smoke: \(x^2 =\) 0, \(p =\) 1

missing bmi on intensity: \(x^2 =\) 1.476, \(p =\) 0.478

missing bmi on rest: \(t =\) 0.021, \(p =\) 0.983

missing bmi on age: \(t =\) -0.368, \(p =\) 0.713

missing bmi on height: \(t =\) -0.639, \(p =\) 0.525

missing bmi on active: \(t =\) -0.717, \(p =\) 0.478

Looking whether the missingness of bmi is MARLooking whether the missingness of bmi is MARLooking whether the missingness of bmi is MARLooking whether the missingness of bmi is MARLooking whether the missingness of bmi is MARLooking whether the missingness of bmi is MARLooking whether the missingness of bmi is MAR

Figure 3.6: Looking whether the missingness of bmi is MAR

3.6 Missingness of Smoke

missing smoke on sex: \(x^2 =\) 5.037, \(p =\) 0.025

missing smoke on intensity: \(x^2 =\) 1.722, \(p =\) 0.423

missing smoke on rest: \(x^2 =\) 0.779, \(p =\) 0.438

missing smoke on age: \(x^2 =\) -1.271, \(p =\) 0.208

missing smoke on height: \(x^2 =\) -0.347, \(p =\) 0.731

missing smoke on bmi: \(x^2 =\) -1.338, \(p =\) 0.188

missing smoke on weight: \(x^2 =\) -0.785, \(p =\) 0.444

Looking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MARLooking whether the missingness of smoking is MAR

Figure 3.7: Looking whether the missingness of smoking is MAR